probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

, the total variation distance is a distance measure for probability distributions. It is an example of a

statistical distance In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be be ...

metric, and is sometimes called the statistical distance, statistical difference or variational distance.

Definition

Consider a

measurable space In mathematics, a measurable space or Borel space is a basic object in measure theory. It consists of a set and a σ-algebra, which defines the subsets that will be measured. Definition Consider a set X and a σ-algebra \mathcal A on X. Then the ...

(\Omega, \mathcal)

and

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more gener ...

P

and

Q

defined on

(\Omega, \mathcal)

. The total variation distance between

P

and

Q

is defined as: :

\delta(P,Q)=\sup_\left, P(A)-Q(A)\.

Informally, this is the largest possible difference between the probabilities that the two

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

s can assign to the same event.

Properties

Relation to other distances

The total variation distance is related to the

Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...

by Pinsker’s inequality: :

\delta(P,Q) \le \sqrt.

One also has the following inequality, due to Bretagnolle and Huber (see, also, Tsybakov), which has the advantage of providing a non-vacuous bound even when

D_(P\parallel Q)>2

: :

\delta(P,Q) \le \sqrt.

When

\Omega

is countable, the total variation distance is related to the L¹ norm by the identity: :

\delta(P, Q)=\frac12\, P-Q\, _1=\frac12\sum_, P(\)-Q(\),

The total variation distance is related to the

Hellinger distance In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of ''f''-divergence. The Hellin ...

H(P,Q)

as follows: :

H^2(P,Q) \leq \delta(P,Q) \leq \sqrt 2 H(P,Q).

These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.

Connection to transportation theory

The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is

c(x,y) = _

, that is, :

\frac \,  P - Q \, _1 = \delta(P,Q) = \inf\ = \inf_\pi \operatorname_

where the expectation is taken with respect to the probability measure

\pi

on the space where

(x,y)

lives, and the infimum is taken over all such

\pi

with marginals

P

and

Q

, respectively.

References

Probability theory F-divergences {{probability-stub

Definition

Properties

Relation to other distances

Connection to transportation theory

See also

References